IMS Collections 

Pushing the Limits of Contemporary Statistics: Contributions in Honor of 
Jayanta K. Ghosh 

Vol. 3 (2008) 170-186 

© Institute of Mathematical Statistics, 2008 
DOI: 10.1214/074921708000000138 



Remarks on consistency of posterior 
distributions 

66 

Taeryon Choi 1 and R. V. Ramamoorthi 2 

£Nj , Inha University and Michigan State University 

Abstract: In recent years, the literature in the area of Bayesian asymptotics 
has been rapidly growing. It is increasingly important to understand the con- 
cept of posterior consistency and validate specific Bayesian methods, in terms 
of consistency of posterior distributions. In this paper, we build up some con- 
ceptual issues in consistency of posterior distributions, and discuss panoramic 
i views of them by comparing various approaches to posterior consistency that 

have been investigated in the literature. In addition, we provide interesting 
results on posterior consistency that deal with non-exponential consistency, 
' improper priors and non i.i.d. (independent but not identically distributed) 

, observations. We describe a few examples for illustrative purposes. 
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1. Introduction 



Let 9 be an unknown parameter and X\, X 2 , ■ ■ ■ , X n be n random variables whose 

(n) 

joint distribution is Pg . In order to draw inferences on 9, a Bayesian posits a 
prior distribution II for 9 and updates this prior to the posterior distribution given 
Xi,X 2} ■ ■ ■ ,X n , which we denote by H(-\Xi 7 X 2 , ■ ■ ■ ,X n ). This paper focuses on 
some issues related to an asymptotic aspect of this posterior distribution, namely, 
consistency. 
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The sequence of posterior distributions {H(-\Xi,X2, ■ ■ ■ , X n )} is said to be con- 
sistent at #0: if the posterior converges, in a suitable sense, to the degenerate mea- 
sure at #o- 

Posterior consistency is a kind of frequentist validation of the updating method. 
If an oracle were to know the true value of the parameter, posterior consistency 
ensures that with enough observations one would get close to this true value. Pos- 
terior consistency also assures that as more and more observations accumulate, the 
observations have to dominate the role of the prior in inference. There are other 
interpretations related to merging of opinions and other concepts. We refer the 
reader to Diaconis and Freedman [10]. 

In order to set the perspective for this paper we begin with a short summary 
of earlier results in posterior consistency. Details and additional references can 
be found in Ghosh and Ramamoorthi [19]. The first posterior consistency result 
goes back to Laplace. In more recent times posterior consistency and asymptotic 
normality of the posterior were established for regular finite dimensional models. In 
a seminal paper, Freedman [12] gave a nonparametcric example, with integer- valued 
observations, where the posterior is inconsistent. In [10], Diaconis and Freedman 
showed that in the nonparametric case inconsistency can occur, even in location 
models with an Euclidean parameter. They suggested that instead of searching for 
priors that would be consistent at all unknown values of the parameter it would be 
fruitful to study natural priors and identify points of consistency. 

On the positive side, Freedman [12, 13] and soon after Schwartz [25] provided 
conditions under which the posterior probability of a set A will go to 0. These condi- 
tions involved two parts, one on prior positivity of Kullback-Leibler neighborhoods 
and the other on existence of certain test functions. Under the assumption of prior 
positivity of Kullback-Leibler neighborhoods, Barron gave necessary and sufficient 
conditions for the posterior probability of A to go to 0. These results were then 
specialized to weak and L\ neighborhoods by Barron et al. [3], Ghosal et al. [15] 
and Walker [32]. 

One aspect of these results was that they all established exponential consistency. 
In this paper we first give a quick review of these results from a slightly different 
perspective with a focus on the role of exponential consistency. We then give an ex- 
ample where there is consistency but not exponential consistency. The example also 
shows that the exponential aspect is not driven by the Kullback-Leibler condition. 

Another early result in consistency is due to Doob [11], who showed that posterior 
consistency occurs for all 8 in a set of prior measure one. In this paper we consider 
a study of the non i.i.d. case based on Doob's result, specifically, the simple linear 
regression model. The martingale techniques are not applied here and we discuss 
the connection of posterior consistency with orthogonality of product measures. 

Consistency is just the beginning of Baycsian asymptotics. Issues such as rates 
of convergence and asymptotic normality have received quite a lot of attention. Yet 
it appears that even at the level of consistency there are still issues that need to be 
clarified. In this paper, we review some conceptual issues in consistency of posterior 
distributions, and discuss different approaches to posterior consistency that have 
been investigated in the literature; we view this as a followup of Ghosal et al. [14]. 
We have attempted to elucidate those sufficient conditions to establish posterior 
consistency and tie up some loose ends on diverse conceptual issues in consistency 
of posterior distributions. The paper also contains some new results along with a 
brief commentary to the subject. In general, detailed proofs are omitted and given 
only when they are different from standard published materials or when the result 
is unpublished. 
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Section 2 contains a summary of some background material, some of the nota- 
tions and assumptions used in the paper. Section 3 largely describes known results 
although some of the proofs are reorganized. The criteria based on the uniform 
strong law of large numbers is new and so far we are not aware of any significant 
application. The result might still be of interest because of its similarities to results 
on the consistency of nonparamctric maximum likelihood estimates (NPMLEs) and 
also because it affords a natural extension to non i.i.d. cases. Section 4 specializes 
the results in Section 2 to the context of consistency. After a brief discussion of 
Schwartz's result, we discuss the known conditions for L\ consistency and the rela- 
tionship between these. Section 5 extends the Schwartz theorem to improper priors 
and formal posteriors. The result is new even in the parametric case. We have not 
pursued conditions for stronger consistency because improper priors usually arise 
in finite dimensional situations where weak and strong consistency coincide. Sec- 
tion 6 contains an example. All the general consistency results in the literature 
actually establish exponential consistency. In Section 6 we give an example where 
consistency obtains but not exponential consistency. The example surprised us as 
we had believed that, at least in the i.i.d. case, consistency would always be at an 
exponential rate. 

In the last section we study the extension of consistency results to a non i.i.d. 
case. We give an example to show that the analogue of Doob's theorem will not 
always hold, and we prove a Doob theorem for the linear regression model with 
nonparametric errors. We also briefly discuss an extension of the theorem of Walker. 

2. Preliminaries 

In the setup that wc consider, is the parameter space; {fg : 9 £ 0} is a family of 
densities with respect to a er-finitc measure fi on a measurable space X. We will use 
Pg to denote the probability distribution generated by fg. Throughout the paper 
we assume that and X are complete separable metric spaces and we also assume 
that 9 i— > fg is 1-1 and (6, x) i— > fg{x) is measurable. 

The affinity, Aff(/, g), between any two densities is defined as Aff(f,g) = 
J ^/Jgdfi. Let II be a prior distribution, i.e., a probability measure on 0. Given 

9, Xi, X2, . . • , X„ are assumed to be i.i.d. Pg. fg (xi, x%, . . . , x n ) will stand for the 
joint density Y)^=x fe( x t)- 

The Kullback-Leibler (KL) divergence is denoted by K(6q, 9) = Eg log(/g//g ). 
A KL neighborhood K e (9 Q ) of 9 is denoted by {9 : K(6 Q , 9) < e}. 

Definition 2.1. A point 9q is said to be in the KL support of LI if for all e > 



The posterior distribution H(A\Xi,X2, ■ ■ • , X n ), the version that we consider, is 
given by the following. For any measurable subset A of 0, 



o,n(jr e (0 o )) >o. 



(2.1) 



U(A\X 1 ,X 2 ,...,X n ) 



Ja(X\, X2, ■ ■ ■ , X n ) 
J(Xi,X2, ■ ■ ■ , X n ) 



where 




Remarks on consistency of posterior distributions 



173 



and 



J(Xi,X 2 , ■ ■ ■ ,X n ) 



An) 
_fl 

(n) 



{X U X 2 



,X n )Ii(d6). 



3. Exponential decrease to 

We begin with a review of results that provide conditions under which, for a mea- 
surable subset A of 0, II(A|Xi , X 2 , ■ ■ ■ , X n ) goes to exponentially with prob- 
ability 1. 

Definition 3.1. Let 6$ e and let P^° stand for the joint distribution of {Xi}°l 1 
when 6q is the true value of 9. Then Yl(A\X\,X2, • ■ ■ , X n ) is said to go to expo- 
nentially with Pg°^ probability 1. if there exists a (3 > such that 

P£ ({n(A|Xx,X 2 , . . . ,X„) > e-*" i.o. }) = 

where i.o. stands for "infinitely often." 

Proposition 3.2 goes back to [12] and [25]. For a proof see [19, Lemma 4.4.1]. 
Proposition 3.2. If Oq is in the KL support of IV then for all (3 > 0, 

lim e n Pj{X x ,X 2 , ...,X n )=oo a.s. P^. 

n — >oo 

Proposition 3.2 shows that the Kullback-Leibler support condition takes care of 
the denominator in (2.1). The exponential convergence to would follow if it can 
be established that there exists (3o > such that e n/3 ° J a {X\ , X 2 , . . . , X n ) — * a.s. 
Pg°. We explore sufficient conditions to achieve this. 

Definition 3.3. For a probability measure v on 6, let q„ be the marginal density 
of X\ , . . . , A„ , 

li n \x 1 ,x 2 , ■ ■ ■ ,x n ) = / fg n \x 1 ,x 2 , ■ ■ ■ ,x n )v(dB). 
Je 

Definition 3.4. Let A C and 6 > 0. The set A and 9o are said to be strongly 6 
separated if for any probability iy on A, 

AS(f ea ,qP)<6. 

The relationship H 2 (f,g) = l — 2AE(f,g) between the Hcllingcr distance H(f, g) 
and the Affinity Aff(/, g) shows that A&(fg ,qi 1 * > ) < S is equivalent to H 2 (fg , 
qi 1 - 1 ) > 1 — S. Say that A and 8 are strongly separated if they are strongly 6 
separated for some S > 0. 

Example 3.5. Suppose that the L\ distance between fg* and fg is larger than 8* 
for some 5* > 0, \\fg* - fg \\ > 5*. Let 

A={e:\\fg.-fg\\< 

It is easy to see that A is strongly separated from 9 for every v on A. 
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We begin by isolating a useful consequence of strong separation. The underlying 
idea is in [32]. Note that the argument is essentially analytic and does not use 
Hoeffding's inequality as in [19]. Lemma 3.6, we believe, can be extended to non- 
i.i.d and even to non-independent cases. We do not pursue this here but will briefly 
return to it in Section 7. 

Lemma 3.6. If 8 and A are strongly 5 separated then for all probability v on A, 
for all n, 

(3.1) AffCf^, 9 (»>) < e~ n *>, where fa = - log 5. 

Proof. The proof is straightforward by induction on n, combined with the definition 
of strong separation. □ 

Remark 3.1. The conclusion of Lemma 3.6 holds with (3q = —logS/k if for all 
v, for some k, AS(fg,qv) < 6, i.e., A and 8q are strongly separated for the 
parametrization 8 i— » f( k \ 

The next result is the celebrated result of Schwartz [25] stated in terms of strong 
separation. A result of LeCam [21] shows that it is equivalent to the formulation of 
Schwartz involving an unbiased test for testing Hq : 8 — 8q vs. Hi : 8 € A. LcCam's 
theorem is proved using the Hahn-Banach theorem so is essentially an existence 
result. Hence the point of view of strong separation could be an easier condition to 
verify in some situations. 

Theorem 3.7 (Schwartz). If 

(1) 6 'o is in the KL support of II, 

(2) for some k, A and 8q are strongly separated for the parametrization 8 i— > /' fe ' . 
Then H(A\Xi, X2, ■ ■ ■ ,X n ) goes to exponentially a.e. Pg°- 

Proof. Let LI* be the probability measure obtained by restricting LI to A and nor- 
malizing it. Then 

Pe {^fJl>e- n i) < e n ^E 0o {^fl2) 

= e^^a{A)m(f^\ q ^) 

< v /nCA)e"' ) e- n/3 ° . 
Taking 7 = /3o/4, it follows easily that 

Pe {^Jl>e~ n ^ i.o.) = 0. 

The proof can be completed easily using Proposition 3.2. For details see [19]. □ 
Proposition 3.2 and Lemma 3.6 easily give the following theorem of Walker [32]. 
Theorem 3.8. // 

(1) 80 is in the KL support of II. 

(2) If A = Ui>iAj such that: 

(a) For some S > all the Ai 's are strongly S separated from 80 and 

(b) E ! >iv / nW<»- 

Then H(A\Xi, X2, ■ ■ ■ ,X n ) goes to exponentially a.e. P£° . 
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Proof. It follows by noting 

(3.2) = e^^y/UMAE(f£>,&)) 

i 

< e«7 e -n/3o^ A /ff(^ ) 

where II* in (3.2) is the normalized restriction of II* to Ai. 



□ 



The next theorem gives another set of sufficient conditions, in terms of the uni- 
form Strong Law of Large Numbers (SLLN), for the posterior probability of a set 
to go to exponentially The conditions are stronger than those of Schwartz [25]. 
They are similar in spirit to the conditions used in the study of Hcllinger consis- 
tency of NPMLEs (see [30]) and suggest a parallel between consistency of NPMLEs 
and posterior consistency. 

Theorem 3.9. Let Ac 9. If 

(1) 9o is in the KL support of II. 

(2) AS(f eo ,f 6 )_<6forall0eA. 

fe 



(3) sup 

ASA 




x)dP n -AS(fg ,f 8 



a.s Pg°°, where P n is the empirical 



distribution obtained from X\ , X2 , ■ ■ ■ , X n . 

Then H(A\Xi, X21 ■ ■ ■ ,X n ) goes to exponentially a.e. P£° . 

Proof. Note that J g(x)dP n = (1/n) X^iLi 9(Xi) for arbitrary function g(x). Thus, 



Ja 



J A 1 J d 

^exp|2n J log ^ji{x)dP n j U{d0) 



< 



since log x < x — 1 



cxp < 2n 




(i)-i dP fl n(4 



Take 6* = 1 — 6. By assumptions 2 and 3, for all large n, 



supW— - dP n < sup ■ 
9eA V JOo eeA 

6* 

< — 4 



f-dP n -AS(fg ,fg) 
fo 

1 - 5* = 1 - 5*/2, 



+ AS(fg Jg) 



which in turn implies that Ja < H(A) exp(—nd* /2). 



□ 



Proposition 3.10. Conditions (2) and (3) of Theorem 3.9 imply that there exists 
a uniformly consistent test for Hq : 6 = 9q vs. Hi : 8 £ A. 
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Proof. Choose So such that 5 + So = /?o < 1- 
Let 



C = < (xi,X2, . . . ,x n ) : sup 



f lL( x )dP n -AS(f ea ,fe) 
J V Jo 



< So 



By assumption (3) of Theorem 3.9, for any e > and sufficiently large n, 
Pg (C) > 1 — e. For each {x\, x 2 , ■ ■ ■ , x n ) in C, for all 6 £ A, 



-r-fa) < sup Aff(/ 8o ,/ e ) + 5 
/e eeA 



so that 



< n(S + S - 1) = -n/3o- 



Therefore, for 9 e A, 



. ,(n) 



□ 



Remark 3.2. Salinetti [24] has used the notion of hypo convergence to study con- 
sistency of posterior and consistency of maximum likelihood estimates. A somewhat 
related result is due to Ghosal and van der Vaart who show that her condition is 
related to Schwartz's testing condition in the discussion of [24]. 

While the results discussed so far deal with sufficient conditions for . . . , 

X n ) to go to exponentially, the next basic result due to Barron [2] gives conditions 
that are both necessary and sufficient. 

Theorem 3.11 (Barron). A C 9. Assume that do is in the KL support of II. Then 
the following are equivalent. 

(i) There exists a [3q such that 

Pe {Il(A\X 1 ,X 2 , ...,X n )> e-"* i.o.} = 0. 

(ii) There exist subsets V n ,W n of 0, positive numbers c\, c%, /3\, [32, and a se- 
quence of tests {(/>n} , (fin based on n observations, such that 

(a) A c V n U W n , 

(b) IL(W n ) < C*ie- ,l/3 \ and 

(c) PeAcfin > i.o.} = and inf E f <fi n > 1 - c 2 e~ n/32 . 

fev n 



4. Consistency 

As before, n stands for a prior and {H(-\Xi, X 2 , ■ ■ ■ ,X n )} denotes a sequence of 
posterior distributions. The sequence of posteriors is said to be consistent at 9o if 
{IL{U\X 1 ,X 2 , . . ■ , X n )} -> 1 a.s.F^ for all neighborhoods U of O . 

Typically the paramctrization 9 i— > fg turns out to be continuous when the space 
of densities is endowed with weak convergence or with the L\ or the Hcllingcr 
metric. Consequently the neighborhoods of interest are those that arise from weak 
or Li topology. 
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In view of the last section, what is required then is to verify that the conditions 
developed in the last section apply to neighborhoods. 
Let g(x) be a bounded measurable function and define 



Clearly A is strongly e separated from 9q and hence if 9q is in the KL support 
of LT then by Theorem 3.7 the posterior probability of A goes to exponentially. If 
U is a weak neighborhood then U c is a finite union of sets of the type displayed in 
(4.1). This establishes exponential consistency for weak neighborhoods. 

Consider the L\ neighborhood 



In this case, in general, U c cannot be expressed as a finite union of sets strongly 
separated from 9q. Unlike the case of weak neighborhoods, in this case we need 
conditions beyond requiring that 9q is in the KL support of LT. 

Theorem 3.8 can be easily adapted in this context. 

Theorem 4.1. Assume 

(1) 6a is in the KL support of LT. 

(2) For all 5 > 0, there exist sets A\, A 2 , . . . such that the diameter of A4, 
diam(A) <6,\JAi = Q and £ y/Tl(A~) < 00. 

Then for any L\ neighborhood U of 9q, the posterior probability of U c goes to 
exponentially a.e. Pg° ■ 

The theorem follows from observing that if U is an e neighborhood, then taking 
{Ai}°l 1 for 5 = e/3 it is easily seen that the A^s that have non-empty intersection 
with U c cover U c . These Ai's satisfy the assumptions of Theorem 3.8. 

Definition 4.2 (Bracketing entropy). Let L C 0. For a 5 > define the bracketing 
entropy 7i(F, S) to be the logarithm of the minimum integer k such that, there exist 
non negative functions /f 7 , f 2 , . . . , fjf satisfying 



(2) for each 9 there exists i such that fg < ff. 

Definition 4.3 (Metric entropy). Let F C 0. For 6 > the Metric entropy J(T, 5) 
is defined to be the logarithm of minimum of all integers k such that there exist 
densities ff, /|, • ■ • , f£ such that for each 9 there exists i such that \\fo — f?\\ < 5. 

If 9q is in the KL support of LT then each of the three conditions listed below 
ensures that the posterior is exponentially L\ consistent. The first condition (W) 
is from Walker's theorem, Theorem 3.8, the next (BSW) is due to [3] and the third 
(GGR) appears in [15]. A formal statement and proof can be found in [3] and [15]. 

(W) For each 6 > 0, there exist sets Ai, A 2 , . . . such that [J Ai — O, Li-diamctcr 



of {fg : 9 e A} < 6 and £ 4 y/nfAj < 00. 
(BSW) For each e > 0, there exist 0„ C 0, and C,ci,C2,5 all positive such that 

(a) n(0=)<e-~». 

(b) H(Q n , S) < nc for c < ([e - VS} 2 - S)/2, 5 < e 2 /4. 

(GGR) If for each e > 0, there is a < 6 < e, ci, C2, j3 < e 2 /2 and 0„ such that 



(4.1) 




U = {9:\\fg-f eo \\<e}. 



(1) ff?(x)iM(dx)<l + 8, 
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(a) n(e=) < cie- n/3 . 

(b) J(e n ,5) <np. 

The next theorem shows that both (W) and (BSW) imply (GGR). A proof that 
(W) => (GGR) was also communicated to us by Ghosal, S. [personal communica- 
tion] . 

Theorem 4.4. (W)^(GGR) and (BSW) => (GGR). 

Proof. (W)=KGGR). 

Assume without loss of generality that II (A; ) = IT is decreasing in i and let 
J2V^h = c < oo. Set 

fen 

e n = \jAi. 
i 

Since the Li-diameter of {fg : E A;} < S, it is easy to see that J(0 n , S) < \ogk„. 
Thus, by taking k n = exp(n/3) one then obtains sieves with the properties required 
by (GGR). 

Next, we shall argue that n(9£) = II (\Ji>k n A*) - 2c 2 /fc„. Note that, for any 
j, jWllj < 5Zi=i Vnl < c so that j < c/ y^lj. Therefore, 



\j>K / j>K j>k n J " 

(BSW)^(GGR) 

Let fx, /2, . . . , fk be functions such that J /j = l + Cj<(l + #) and such that for 
any £ L, 3 i such that / e < /j. Let /* = /;/(l + d)- Then 

ii/; -mi < T^-n/i-(i+^)Mi 

< ll/i-MI + cj 

Hence /j", / 2 *, . . . , /£ forms a 28 net for L and J(T, 2d) < H(T, S). □ 



5. Improper priors and formal posteriors 

Suppose that II is an improper prior on 0, that is, a cr-finite measure with II(O) = 
oo. A formal posterior density given X\ = x±, X 2 = x 2 , ■ ■ ■ X n — x n is defined as in 
Equation (2.1). This is of course well defined only if 

f 

J(x 1 ,x 2 , ■ ■ -,x n ) = / -j^(xi,x 2 , ■ ■ .,x n )U(d0) < oo. 
Je h 

This situation occurs widely in the context of noninformative priors (see for 
example, Ghosh and Ramamoorthi [18] and Kass and Wasserman [20]). 

The next theorem shows that if Pq is in the KL support of LI then the posterior is 
weakly consistent. Improper priors largely arise in the context of finite dimensional 
regular models and in these situations weak consistency and strong consistency 
coincide. Hence, we do not develop conditions akin to (W), (BSW) or (GGR) for 
improper priors. First, Lemma 5.1 states a result of the KL support of n(-|x). 
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Lemma 5.1. Let Pq is in the KL support of II. Denote by A = {x : J(x) = 
J fg(x)H(d9) < 00} . Then, for Pq almost all x in A, 6q is in the KL support of 

n(.|z). 

Proof. Fix e > 0. Consider E = {x e A : IL(K e \x) = 0}. We shall show that 
P 6o (E) = 0. Note that 



n(K e \x) 



J e L Ke (0)fe(xMd0) 
Sf e (x)U(d8) 



Denoting by II* the measure II(- n K e )/H(K e ), since, for x € E, U(K e \x) = 0, we 
have that 

U*{9 : f g (x) = 0} = 1. 

Consequently f E f K fg(y)U*(d6)(E)dfi(y) = 0. Interchanging the integrals, 

Ik He fo{y)d^{y)]W* {d9) = and hence there exists some 9' such that 

J E fe>{y)d[i(y) = so that Pe>(E) = 0. For every 6 in K e Pg dominates Pg , so 

Pg (E) = 0. Letting e run through rationals, the lemma is established. □ 

Theorem 5.2. Let H be an improper prior on 0. {fg : 6 £ 0} is a family of 
densities. Assume that the formal posterior is defined with probability one. 
Formally, if 

A n = {xi,x 2 , ■ ■ ■ ,x n : J(xi,x 2 , ...,x n ) < 00} then P °°([J A n ) = 1. 
If 9o is in the KL support of II then the formal posterior is weakly consistent at 6q . 

Proof. By Lemma 5.1, for each n, except for those in a set of Pg measure 0, for all 
(xi,X2, ■ ■ ■ , x n ) £ A n , 9q is in the KL support of IL(-\(xi,X2, ■ ■ ■ , x n )). 

Since on A n , II(-|( )) = n (a . 1)X2i ... iXn )(-|a;„ + i), the result follows. 

□ 



6. Example 

All the results discussed so far are related to exponential consistency. The next 
example shows that, even in the context of i.i.d. observations, the posterior can be 
consistent at a non-exponential rate. 

Consider an example where wc have a prior II, /o is in the KL support of II and 
the posterior is not L\ consistent, i.e., there is a set A which is a complement of a 
neighborhood of fo and whose posterior does not go to 0. Such an example appears, 
for instance, in Barron et al. [3]. 

Consider the prior to II* = .55 f + .5U. Then by Doob's theorem the posterior 
of A goes to 0. It cannot go exponentially, for if it does, by Barron's theorem (e.g. 
[2] and [19, Theorem 4.4.3]), there would be sieves V n and sets U n of exponentially 
small IT probability that cover A. These properties also carry over to n and now the 
first part of Barron's result would imply that the original prior n is itself consistent, 
in fact, exponentially consistent. 
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7. Independent but non-identically distributed models 
7.1. Extension to posterior consistency 

Here we look at the setup where, as before, is a parameter space and II is a prior 
on 0. Given 9, we assume that Xi,X2, ■ ■ . are independent with Xi distributed as 
fi,e- 

All the results discussed so far can be easily adapted, but not necessarily easily 
applied in the non-identically distributed case. As in Section 1, the posterior can be 
written as the ratio of two integrals. A stronger form of KL support (for instance, 
see Choudhuri ct al. [9] and Amewou-Atisso et al. [1]) takes care of the denominator. 
It is not clear if there is a simple version of the (GGR) type of sufficient condition. 
Instead, those results for independent but non-identically distributed models as in 
Amewou-Atisso et al. [9], Choudhuri et al. [1], Ghosal and Roy [1G] and Choi and 
Schervish [8], tried to establish the existence of uniformly consistent tests directly, 
which makes the numerator in the ratio of two integrals decrease to exponentially 

Alternatively, LeCam [22] and Birge [4] showed that for independent 
non-identically distributed variables, tests with exponentially small errors exist 
when we use the average squared Hcllinger distance to separate densities and con- 
vex sets. That is, uniformly consistent tests are always obtained if the entropy with 
such a distance is controlled. In the recent paper by Ghosal and van der Vaart 
[17], (GGR) type results have been investigated in the test construction for the 
convergence rates of posterior distributions for non i.i.d. observations. 

On the other hand, Walker's sufficient conditions are easily adaptable in this 
case. Note that the proof of Lemma 3.6 docs not require the assumption of the 
identically distributed observations; hence Theorem 3.8 easily follows to this case. 
We state it formally below. 

Theorem 7.1. If A = {J t>1 At such that 

1. For some 6 > all the A{ 's are strongly 5 separated from 9o for the model 
8 i ► fifi and 

2- E 4 >iv/nW<^- 

Then for some Pq > 0, 

n Jh o / \ OO 

e n Po / JjA^ n(d(9) _ 0a . tf JJp 

Similar results to Theorem 7.1 along with regression problems have been dis- 
cussed in Walker [33] . 

Example 7.2 (Orthogonal series expansion). Let 
(7.1) Y l = r){Xi) + 6i, i = l,...,n 

where the e,'s are assumed to be independent ^(0,1) random variables, the XiS 
are sampled from a known probability distribution, and ?/(•) is a regression function. 
An orthogonal series expansion for the regression function rj(x) is a representation 
of 77(2;) by an infinite sum, 

00 
j'=i 
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where {4>ji x )}'jLi is an orthonormal basis for an L 2 space containing r\. Regarding 
either posterior consistency or rate of convergence of posterior distributions, this 
model has been investigated by Shen and Wasserman [26], Walker [33] and Choi 
and Schervish [8]. 

Let {4>j(-)}J^i be an orthonormal basis for L 2 [0, 1] such that for some C > 0, 

su Px6[0,l] \<f>j( x )\ < c for a11 3- 

In this case, we consider the following ^-covering of f2, a union of sets of the type 

(7.2) {V> : rijSj < i> 3 < {rij + l)Sj, j = l,2,.. .}, 

which was also examined for Hcllinger consistency in density estimation problems 
from infinite-dimensional exponential families in Walker [32] and regression prob- 
lems in Walker [33]. Based on (7.2), the condition (b) in Theorem 7.1 can be verified 
as in Section 6.1 [32]. When the regression function is uniformly bounded, the -Li(or 
Hcllinger) neighborhood of the true density fg becomes equivalent to the L\ neigh- 
borhood of the true regression function 770- Therefore, by considering a J-covcring 
in (7.2) and its corresponding prior probability, two conditions (a) and (b) are eas- 
ily verified. Hence, the conclusion of Theorem 7.1 is achieved when A is in the L\ 
neighborhood of the true density generating the regression model (7.1). 

Example 7.3 (Gaussian process regression). Gaussian process regression is one of 
the popular approaches to Baycsian nonparametric regression problems, and it is 
used to model the regression function r\(x) as a Gaussian process a priori. Poste- 
rior consistency based on Gaussian processes has been established in Ghosal and 
Roy [16] and Choi [7] for nonparametric binary regression, Tokdar and Ghosh [29] 
for density estimation and Choi [ii] and Choi and Schervish [S] for nonparametric 
regression. Interestingly, all the results mentioned above have been based on con- 
structing uniformly consistent tests rather than the condition (b) in Theorem 7.1. 
The challenges in the study of posterior consistency based on Gaussian processes 
is to find a rate that a prior probability shrinks as we consider a sequence of 5- 
coverings that satisfies the condition (b). In this case, the important task to be 
achieved is obtaining the exponentially small lower bound for small balls of Gaus- 
sian processes. There is a recent investigation in this regard (e.g. see Li and Shao 
[23] and van der Vaart and van Zanten [31]). It would be interesting to explore if 
this difficulty in verifying (b) under Gaussian process priors can be bypassed when 
we apply Theorem 7.1. 

7.2. Doob's theorem 

Doob [11] showed that when O is the parameter space and given 9, Xi,X2, . . . , 
are i.i.d. Pe then, for any sequence of posterior distributions n(-|Xi,X2, . . . ,X n ), 
under mild set theoretic assumptions (for instance when X and 8 arc Borel subsets 
of Polish spaces) for any prior n, there is a 8n C 6 with n(8n) = 1 such that the 
posterior is consistent at all 9 £ On- In what follows we explore the analogue of 
Doob's theorem in independent non-identically distributed models. 

To change the notation a bit, given 9 in 8, let Yi, Y2, . . . ,be y valued random 
variables with joint distribution Pe,oo- For any prior n on 8, denote by An the 
joint distribution induced on 8 x y°° by n and {Pe.oo '■ 9 £ 8}. We will denote the 
elements of y°° by y and of (Y 1; Y 2 , . . . , ) by Y. As before H(-\Y lt Y 2> Y„) 
will stand for a fixed version of the posterior distribution of 9. 

By going through an appropriate countable set of continuous functions g and 
applying the martingale convergence theorem to each posterior mean of g(9), it can 
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be seen that there is a conditional probability II* (-\y) such that for all y outside a 
An null set 

n(-|Y 1 ,Y 2 ,...,Y„)^ IT(.|y). 

Clearly the posterior is consistent at 9 if II*(-|y) = Sg a.e. Pg tOQ . 

Proposition 7.4. Consider the following two sets of statements for a given prior 
II; 

1. There is a set On with Il(Ori) = 1 and the posterior is consistent at all 9 £ ©n- 

2. There is a set On with II(On) = 1 and a measurable set E u C 6 x y°° such 
that 

(a) for each 9 £ n , P S ,oc( E ¥) = 1 > 

(b) EfnEf, = for 9 ^9'. 

The two sets of statements are equivalent. 
Proof. Suppose (1) holds. Then it is easy to verify that the set 

E u = {(9,y) :U*(-\y) = 6 e , (9 £ O n )} 

is measurable and satisfies the conditions in (2). 

On the other hand if (2) holds then define 4>{y) = 9 if (9, y) £ E . Then, using 
a result from set theory [28, Theorem 4.5.7], it can be shown that 4> is measurable. 
It is easy to verify that II defined by 

n(-|y) = hiy) 

is a version of the conditional distribution of 9 given Y and hence II* (-|y) = 
II(|y) a.e. An- An application of Fubini's theorem yields the result. □ 

Our interest is in establishing (1) for all priors II and it is convenient to work 
with a stronger version of (2) by seeking a decomposition that does not depend 
upon II. Formally, 

Proposition 7.5. Let H be a prior for O. 

Suppose there exists a measurable set E C O x y°° such that 

1. For each 9 £ Q,Pe,oo{Ee) = 1 where Eg is the 9-section {y : (6,y) £ E}. 

2. E e r\E e > =0 for 9^9' '. 

Then there is a set On with II measure 1, such that the posterior is consistent 
at all 9 £ O n . 

Thus, Doob's theorem is intimately related to uniform orthogonality of {Pg.oo ■ 
9 £ Q}. There is a wide literature on singularity and mutual absolute continuity of 
measures on infinite product spaces ([27] and [5]). This literature in general deals 
with pairwisc orthogonality whereas Proposition 7.5 requires uniform orthogonality. 
The step from pairwise to uniform orthogonality can be formidable. Yet we feel that 
some of these results are likely to be useful in establishing Doob-type theorems in 
the non-i.i.d. set up. 

Motivated by Proposition 7.5, we present an example where the Doob-type theo- 
rem fails to hold. On the positive side, Proposition 7.5 enables us to prove a theorem 
for linear regression models with nonparamctric errors. 
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The case that we consider is 

Y t = a + (3xi + ei, i = l,2, ... 

where 

1. Xi, X2, ■ ■ ■ , are fixed non-random design points. 

2. ei,e2, . . • are independent and identically distributed random variables with a 
probability density symmetric around 0. 

Example 7.6. Suppose J\ xf < oo and 6j ~ AT(0, 1), and let a = 0. In this case 
it follows from a result of Shepp [27] that YiT N(f3xi, 1) are mutually absolutely 
continuous. Hence the decomposition required by Proposition 7.4 fails and Doob's 
theorem cannot hold. 

The last example we consider is semiparametric regression, the linear regression 
model where the distribution of the noise is assumed to be unknown and thus 
needs to be estimated. This example has been investigated in terms of posterior 
consistency, following from the generalization of the Schwartz theorem in Amewou- 
Atisso et al. [1]. We revisit this example in Theorem 7.7 and show that the Doob- 
type theorem holds with an assumption on the fixed non-random design points, 
similar to that of [1]. 

Let Assumption A be defined as the following: There exists eo > such that the 
covariate values Xi's satisfy 

^ /(-oo _ 60 ) jxj) =oo and ~Y] I( eo ,oo)(xi) = oo. 

i i 

Theorem 7.7. Consider the model 

Yi = a + (3xi + ej, i = l,2, ... 

where 

1. Xi, X2, . ■ ■ , are fixed nonrandom design points. 

2. ei, 62, . . . are i.i.d. variables with an unknown distribution of which density f is 
symmetric, continuous at and /(0) > 0. 

If Assumption A holds, then given any prior II for (a, (3, f), there is a set £>u of H 
measure 1 such that the posterior is consistent at all (a, /3, f) G On- 

Proof. Let T be all densities / on the real line which are symmetric, continuous 
at and /(0) > 0. Formally, we have as the parameter space = _R x R x T 
and given (a, (3, /), the Yi's are independent with Yi ~ fa+ftxi, where fa+pxiiy) = 
f(y - (a + (3xi)). 

We now construct a decomposition satisfying the conditions of Proposition 7.5. 

Let Ni = {ni,n2,...} be the subsequence of all i with x% > e and Mi = 
{mi, 7772, . . .} be the subsequence of all i with Xi < —e. Let i be a real number 
and define A t = (t, 00). 

Note that the unknown parameter 6 is the triple 9 = (a, f3, /). Following nota- 
tions in Proposition 7.5, let II be a prior on and let E u be the set of all (a, /3, /, y) 
such that for any real number t, 

1. lim^oo i Y.i IaM -(a + /3x i ))=P f (A t ). 

2. linifc^oo \ J2i I A t (y n , - {a + (3x n .)) = Pj(A t ). 



184 



T. Choi and R. V. Ramamoorthi 



3. lim/^oo j £)i I A t {y mi - {a + t3x mi) ) = Pf(A t ). 

Since N\ , Mi are fixed subsequences and since it is enough to work with t - 
rational, E u is easily seen to be measurable. 

Further, for each (a, j3,f), [l\T p a+0x,] ( E a,p,f) = h where E^ pf is the 
(a, f3, /)-section {y : (a,/3,f,y) £ E n } for each (a,/?,/) as defined in Proposition 
7.5. This follows by noting that under [FJ^° P a +p Xi ], Yi~(a+pX\), Y 2 —(a+Px2), ■ ■ ■ 
are i.i.d. with common density /. An application of the law of large numbers proves 
the claim. 

We next argue that if ^ (a 2j ft, /a) then E% ufiufl n ££ 2)/ 3 2 ,/ 2 = 0- 

If ai = ot2,fi\ = 02 and /i 7^/2, an d if y G P^ ^ ^ CiE^ p j 9 then a contradiction 
is easily obtained by considering a t for which P/^At) ^ P/ 2 (A t ). 

Now suppose that for some A > 0, 0:1—0:2 > A and Pi—fa > A. Clearly for every 
Hi G N±, — (3%)x ni > Ae. Choose 77 such that t] < Ae and inf^i^ fi(x) > C > 0. 

Since / is symmetric and r] > 0,Pf 1 (A v ) < 1/2. We will get a contradiction by 
showing that if y G E^ M n ££ 2)/32 ,/ 2 , then P/^A,,) > 1/2. 

If 2/ e n < A ,, 2 , then for all t, 

(7-3) ±J2l At (y nt - (a, + f3 lXnz )) -> P fl (A t ), 

1 

1 fc 

(7-4) fcS-MVn, - (aa + P/ 2 (A t ), 



Ol + /SlXn* = «2 + + («i - OL 2 ) + (01 ~ fi-l)X ni 

> a 2 + foXm + V 

and hence 

iAtiVm ~ ("i + PiXm)) > lAtiVm - ("2 + fax ni + ?/)) 

= iAt-niVrn ~ («2 + p2X ni )). 

In particular with t = 7], 

lAniVm - («1 + P\Xm)) > I(o,oo){ym - ("2 + /^nj)- 
Consequently 

T ^ J At {Vm ~ (**! + Pl^nt)) > 7 y] / (0.oo)(yn I ~ ("2 + A^tk)) 
fe 1 fc 1 

-> P /2 (0,oo) = i. 

The case when oi — 02 < A,/?i — $2 > A can be handled by considering the 
subsequence M\. Similarly, the other remaining cases follow. □ 
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